Preparing Data

In order to visualise the data in a meaningful way I first need to prepare it. First step is reading in the necessary libraries and then read in the data and inspect it.

# loading libraries
library(googlesheets4)
library(leaflet)
library(sf)
library(RColorBrewer)
library(dplyr)
library(htmltools)
library(tidyverse)
library(ggplot2)
library(ggnewscale)
# getting authentication
gs4_deauth()
# reading file containing articles' information including country
articles <- read_sheet("https://docs.google.com/spreadsheets/d/1NFlbYvgNJCsr0uW5uyhvtZPhoj4UlpB0D9DB7rlhu6g/edit?usp=sharing", range = "Ark1")
articles
# A tibble: 60 × 10
   Author    Year Title Journal Abstract Keywords Country Questionnaire/Interv…¹
   <chr>    <dbl> <chr> <chr>   <chr>    <chr>    <chr>   <chr>                 
 1 Ammann,…  2023 Data… Data i… The art… Digital… Switze… Yes                   
 2 Finger,…  2019 Prec… Annual… Precisi… Big dat… Switze… No                    
 3 Rijswij…  2019 Digi… NJAS -… Digital… Advisor… New Ze… Yes                   
 4 Michels…  2020 Smar… Precis… This st… Digital… Germany Yes                   
 5 Ciruela…  2020 Digi… Sustai… This pa… Coopera… Spain   No                    
 6 Newton,…  2020 Farm… Agricu… This pa… Collabo… Austra… No                    
 7 Zhang, …  2023 Can … Enviro… This st… Agricul… China   Yes                   
 8 Groher …  2020 Digi… Animal  This st… Farm ch… Switze… Yes                   
 9 Pfeiffe…  2021 Unde… Agricu… This st… Dairy; … Germany Yes                   
10 Groher,…  2020 Stat… Precis… This pa… Digital… Switze… Yes                   
# ℹ 50 more rows
# ℹ abbreviated name: ¹​`Questionnaire/Interview`
# ℹ 2 more variables: Citations <dbl>, URL <chr>
# reading file containing articles' information including country (offline access)
# library(readxl)
# articles <- read_excel("data/articles.xlsx")

# reading GeoJSON file containing country boundaries
countries <- st_read("data/countries.geojson")
Reading layer `countries' from data source 
  `/Users/miakuntz/Documents/UNI/6. semester/spatial_analytics/spatial_final_proj/data/countries.geojson' 
  using driver `GeoJSON'
Simple feature collection with 255 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -180 ymin: -90 xmax: 180 ymax: 83.6341
Geodetic CRS:  WGS 84
countries
Simple feature collection with 255 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -180 ymin: -90 xmax: 180 ymax: 83.6341
Geodetic CRS:  WGS 84
First 10 features:
                  ADMIN ISO_A3                       geometry
1                 Aruba    ABW MULTIPOLYGON (((-69.99694 1...
2           Afghanistan    AFG MULTIPOLYGON (((71.0498 38....
3                Angola    AGO MULTIPOLYGON (((11.73752 -1...
4              Anguilla    AIA MULTIPOLYGON (((-63.03767 1...
5               Albania    ALB MULTIPOLYGON (((19.74777 42...
6                 Aland    ALA MULTIPOLYGON (((20.92018 59...
7               Andorra    AND MULTIPOLYGON (((1.707006 42...
8  United Arab Emirates    ARE MULTIPOLYGON (((53.86305 24...
9             Argentina    ARG MULTIPOLYGON (((-68.65412 -...
10              Armenia    ARM MULTIPOLYGON (((45.54717 40...

The articles data is read in as a data frame, and at first inspection seems to have everything it should have included. The countries file is read in as a simple feature object with 255 features and 2 fields. Its a multipolygon, which means that the countries geometry is a collection of multiple polygons. The dimension of the data is a two dimensional coordinate system “XY”, where the bounding box values show that it covers nearly the whole world. The coordinate reference system (CRS) of the data is WGS84, which is typical when working with global spatial data.

Next part of the code focuses on preparing the data to be visualised. I first split those rows in the “Country” column which has several countries in it so to be able to credit all countries responsible for that particular article when they are visualised in the maps. I thereafter unnest those arrays into their own set of rows. Lastly, I merge the two data set into one merged_data data frame by the two columns where they share content.

# splitting multiple countries in "Country" column into separate rows
articles$Country <- strsplit(articles$Country, ", ")

# unnesting to convert array into set of rows 
articles <- unnest(articles, Country)

# merging articles and countries based on country column
merged_data <- merge(articles, countries, by.x = "Country", by.y = "ADMIN", all.x = TRUE)
head(merged_data, n=3)
    Country
1 Australia
2 Australia
3   Belgium
                                                                                                                                                                       Author
1                                                                                                                           Newton, Joanna E.; Nettle, Ruth; Pryce, Jennie E.
2    Fielke, Simon J.; Taylor, Bruce M.; Jakku, Emma; Mooij, Martijn; Stitzlein, Cara; Fleming, Aysha; Thorburn, Peter J.; Webster, Anthony J.; Davis, Aaron; Vilas, Maria P.
3 Barnes, Andrew; De Soto, Iria; Eory, Vera; Beck, Bert; Balafoutis, Athanasios; Sánchez, Berta; Vangeyte, Jürgen; Fountas, Spyros; van der Wal, Tamme; Gómez-Barbero, Manuel
  Year
1 2020
2 2021
3 2019
                                                                                                                           Title
1                 Farming smarter with big data: Insights from the case of Australia's national dairy herd milk recording scheme
2                                   Grasping at digitalisation: turning imagination into fact in the sugarcane farming community
3 Influencing factors and incentives on the intention to adopt precision agricultural technologies within arable farming systems
                           Journal
1             Agricultural Systems
2           Sustainability Science
3 Environmental Science and Policy
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Abstract
1                                                                                                                                                                                                                                      This paper explores the use of big data in Australia's dairy industry and factors influencing farmer engagement. It identifies important dimensions of farmer demand for big data applications and highlights critical attributes of support services. The findings contribute to understanding collaborative governance arrangements that support farm engagement with big data.
2                                                                    This paper addresses nutrient runoff in Australia's Great Barrier Reef and the use of water quality monitoring and digital technology to mitigate the issue. It emphasizes the role of sugar cane farmers and the agricultural knowledge network in promoting sustainability. The concept of digi-grasping is introduced, along with the digi-MAST framework for assessing digital engagement and transformation. The framework guides the allocation of resources and actions for maximizing the impact of digital technological research outputs.
3 This study explores the adoption of precision agriculture technologies (PATs) among European farmers. A survey of 971 farmers in five countries reveals differences between current adopters and non-adopters of PATs. Non-adopters have more confidence in their field knowledge and are generally older. Non-adopters intending to adopt PATs in the future are more open to incentives. Attitudes towards investment certainty and payback periods vary. These findings suggest a gradient of adoption in European arable farming, requiring targeted policy interventions for sustainable agricultural production.
                                                                                  Keywords
1         Collaborative governance; Farm decision making; Herd testing; Livestock genomics
2 Agriculture; Digital technology; Responsible innovation; Social science; User experience
3      Arable farming; Incentives; Precision agriculture; Zero inflated Poisson regression
  Questionnaire/Interview Citations
1                      No        57
2                     Yes        20
3                     Yes        99
                                                                                                                         URL
1                                             https://www.sciencedirect.com/science/article/pii/S0308521X19309758?via%3Dihub
2 https://link.springer.com/article/10.1007/s11625-020-00885-9?utm_source=getftr&utm_medium=getftr&utm_campaign=getftr_pilot
3                                             https://www.sciencedirect.com/science/article/pii/S1462901118305471?via%3Dihub
  ISO_A3                       geometry
1    AUS MULTIPOLYGON (((158.8657 -5...
2    AUS MULTIPOLYGON (((158.8657 -5...
3    BEL MULTIPOLYGON (((4.815447 51...

When inspecting the merged_data data frame it appears as if all the columns from the two data files have merged successfully. This data frame only contains the countries and ISO codes of the countries which appears in the articles file.

To get a quick impression of the distribution of articles across countries I wish to add a count bar for the top five countries appearing in the articles data. To do this I first create a separate table for the Country column in the merged_data data frame. I then convert this to a data frame of its own, so that I am able to use these counts for later use in the maps. Lastly, I arrange the counts_df in descending order according to the Count column and then assign the top five countries to its own object. This object is presented as a bar chart for a quick view of which countries has the highest count.

# calculating frequency count of each country
country_counts <- table(merged_data$Country)

# converting frequency counts to data frame
counts_df <- data.frame(Country = names(country_counts), Count = as.numeric(country_counts))
counts_df
                    Country Count
1                 Australia     2
2                   Belgium     1
3                    Brazil     2
4                     China     1
5            Czech Republic     1
6                   Denmark     3
7                   Finland     1
8                    France     2
9                   Germany    12
10                    Ghana     1
11                   Greece     3
12                  Hungary     4
13                    India     1
14                    Italy     7
15                   Latvia     1
16               Madagascar     1
17              Netherlands     1
18              New Zealand     2
19                   Norway     1
20                   Poland     1
21                   Russia     3
22                   Rwanda     1
23                 Slovenia     1
24                    Spain     2
25              Switzerland     6
26                   Taiwan     1
27                   Turkey     1
28                  Ukraine     1
29           United Kingdom     1
30 United States of America     3
# merging country counts with countries GeoJSON data
merged_geojson <- merge(countries, counts_df, by.x = "ADMIN", by.y = "Country", all.x = TRUE)
merged_geojson
Simple feature collection with 255 features and 3 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -180 ymin: -90 xmax: 180 ymax: 83.6341
Geodetic CRS:  WGS 84
First 10 features:
                          ADMIN ISO_A3 Count                       geometry
1                   Afghanistan    AFG    NA MULTIPOLYGON (((71.0498 38....
2  Akrotiri Sovereign Base Area    -99    NA MULTIPOLYGON (((32.84081 34...
3                         Aland    ALA    NA MULTIPOLYGON (((20.92018 59...
4                       Albania    ALB    NA MULTIPOLYGON (((19.74777 42...
5                       Algeria    DZA    NA MULTIPOLYGON (((8.60251 36....
6                American Samoa    ASM    NA MULTIPOLYGON (((-168.1605 -...
7                       Andorra    AND    NA MULTIPOLYGON (((1.707006 42...
8                        Angola    AGO    NA MULTIPOLYGON (((11.73752 -1...
9                      Anguilla    AIA    NA MULTIPOLYGON (((-63.03767 1...
10                   Antarctica    ATA    NA MULTIPOLYGON (((-162.409 -8...
# getting top five countries with highest number of studies
top_countries <- counts_df %>%
  arrange(desc(Count)) %>%
  head(5)

# converting 'Country' column to factor with ordered levels based on 'Count'
top_countries$Country <- factor(top_countries$Country, levels = top_countries$Country[order(top_countries$Count)])

# plotting bar chart of top five countries count
ggplot(top_countries, aes(x = Country, y = Count)) +
  geom_bar(stat = "identity", fill = "darkgreen") +
  labs(title = "Top Five Countries Count", x = "Country", y = "Count") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_y_continuous(breaks = seq(0, max(top_countries$Count), by = 2), labels = function(x) as.character(as.integer(x)))

Choropleth Map 1

# defining color palette
color_palette <- brewer.pal(9, "Greens")[-(1:3)]  # excluding first four colors

# creating leaflet map
choropleth_map1 <- leaflet(data = merged_geojson) %>%
  setView(lng = 0, lat = 0, zoom = 2) %>%
  addProviderTiles("OpenStreetMap.Mapnik") %>%
  addPolygons(fillColor = ~colorNumeric(color_palette, domain = Count)(Count),
              weight = 1,
              opacity = 1,
              color = "white",
              fillOpacity = 0.7,
              highlight = highlightOptions(
                weight = 2,
                color = "white",
                fillOpacity = 0.9
              ),
              label = ~paste(ADMIN, ": ", Count))
# displaying first choropleth map
choropleth_map1

Choropleth Map 2

# creating leaflet map
choropleth_map2 <- leaflet(data = merged_geojson) %>%
  setView(lng = 0, lat = 0, zoom = 2) %>%
  addProviderTiles("OpenStreetMap.Mapnik") %>%
  addPolygons(fillColor = ~colorNumeric(color_palette, domain = Count)(Count),
              weight = 1,
              opacity = 1,
              color = "white",
              fillOpacity = 0.7,
              highlight = highlightOptions(
                weight = 2,
                color = "white",
                fillOpacity = 0.9
              ),
              label = ~paste(ADMIN, ": ", Count)) %>%
  addLegend("bottomright", pal = colorNumeric(color_palette, domain = unique(counts_df$Count)), values = unique(counts_df$Count), title = "Count") %>%
  addControl(html = as.character(tags$div(style = "text-align: center; background-color: white; padding: 10px; font-family: Arial, sans-serif; font-size: 16px; font-weight: bold;", 
                                          HTML(paste0("World map of studies in Smart Farming<br>",
                                                      "<span style='font-size: 12px;'>Hover mouse over country to view its count</span>")))),
              position = "topright") %>%
  addControl(html = as.character(tags$div(HTML(paste0("<h4>Top 5 Countries</h4>",
                                                      "<table>",
                                                      "<thead><tr><th>Country</th><th>Count</th></tr></thead>",
                                                      "<tbody>",
                                                      paste0("<tr><td>", top_countries$Country, "</td><td>", top_countries$Count, "</td></tr>", collapse = "\n"),
                                                      "</tbody>",
                                                      "</table>")))),
              position = "bottomleft")
# displaying second choropleth map
choropleth_map2

Centroid Map

# repairing invalid geometries
merged_geojson <- st_make_valid(merged_geojson)

# simplifying country geometries
simplified_geojson <- st_simplify(merged_geojson, preserveTopology = TRUE, dTolerance = 0.01)

# creating centroids for each country
centroids <- st_centroid(simplified_geojson)

# extracting coordinates from centroids
centroids <- st_coordinates(centroids)

# converting to data frame
centroids_df <- as.data.frame(centroids)

# renaming the columns
colnames(centroids_df) <- c("x", "y")

# combining with the count data
centroids_df <- cbind(centroids_df, Count = simplified_geojson$Count)
# creating centroid map
centroid_map <- ggplot() +
  geom_sf(data = countries, fill = "grey", alpha = 0.3) +
  new_scale("size") +
  geom_point(data = centroids_df, aes(x = x, y = y, size = Count, color = Count), alpha = 0.7) +
  scale_size(range = c(1, 10), name = "Count") +
  scale_color_gradient(low = "blue", high = "red", name = "Count") +
  theme_void() +
  coord_sf() +
  labs(title = "World Map of Studies in Smart Farming") +
  theme(plot.title = element_text(hjust = 0.5, size = 16)) +
  guides(color = guide_legend(override.aes = list(size = 3))) +
  guides(size = FALSE)
# displaying centroid map
centroid_map